Combining Deep and Unsupervised Features for Multilingual Speech Emotion Recognition

نویسندگان

چکیده

In this paper we present a Convolutional Neural Network for multilingual emotion recognition from spoken sentences. The purpose of work was to build model capable recognising emotions combining textual and acoustic information compatible with multiple languages. derive has an end-to-end deep architecture, hence it takes raw text audio data uses convolutional layers extract hierarchy classification features. Moreover, show how the trained achieves good performances in different languages thanks usage unsupervised As additional remark, is worth mention that our solution does not require be word- or phoneme-aligned. proposed model, PATHOSnet, evaluated on corpora (IEMOCAP, EmoFilm, SES AESI). Before training, tuned hyper-parameters solely IEMOCAP corpus, which offers realistic recording transcription sentences emotional content English. final turned out provide state-of-the-art some selected sets four considered emotions.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

Unsupervised Deep Belief Features for Speech Translation

We present a novel formalism for introducing deep belief features to Hierarchical Machine Translation Model. The deep features are generated by unsupervised training of a deep belief network built with stacked sets of Restricted Boltzmann Machines. We show that our new deep feature based hierarchical model is better than the baseline hierarchical model with gains for two different languages pai...

متن کامل

Integrating multilingual articulatory features into speech recognition

The use of articulatory features, such as place and manner of articulation, has been shown to reduce the word error rate of speech recognition systems under different conditions and in different settings. For example recognition systems based on features are more robust to noise and reverberation. In earlier work we showed that articulatory features can compensate for inter language variability...

متن کامل

Combining Speech and Gender Classification for Effective Emotion Recognition

The applications of emotion recognition in consumer electronics are increasing day by day. However the accuracy and stability of the decisions made by appliances largely depends on the efficient recognition of these emotions. The performance may degrade drastically due to interfering noise. This paper proposes a method which may improve the accuracy significantly. Results have confirmed that th...

متن کامل

Speech Emotion Recognition Considering Local Dynamic Features

Recently, increasing attention has been directed to the study of the speech emotion recognition, in which global acoustic features of an utterance are mostly used to eliminate the content differences. However, the expression of speech emotion is a dynamic process, which is reflected through dynamic durations, energies, and some other prosodic information when one speaks. In this paper, a novel ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2021

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-030-68790-8_10